Ruta chalepensis L., known as “fringed rue”, is one
of the most spread medicinal plants of the Ruta genus. It is an
evergreen subshrub native to Mediterranean region; however, it may be found in
several other parts of the world (Yoshida et al. 2019). R. chalepensis
is among the richest sources of different important natural products e.g.
flavonoids, alkaloids, saponins, coumarins, cardiac glycosides, terpenoids,
tannins, and anthraquinones (Emam et al. 2010; Al-Majmaie et al. 2019). The plant is
largely used in the traditional medicine in several different countries
including Saudi Arabia (Günaydin and Savci 2005).
Different parts of this plant are commonly used for the treatment of menstrual
and labor pains, eye problems, some nervous disease, and intestinal and stomach
problems (Loizzo et al. 2018). In some countries, the plant is used to
enhance the flavor of some food and drinks. Because of its high content of
flavonoids, especially rutin (Loizzo et al. 2018), R. chalepensis
attracted many studies for its potential medical importance.
Rutin is a major flavonoid found in R.
chalepensis with a wide number of pharmacological activities (Loizzo et al.
2018). In central nervous system, rutin showed protective effect against
brain ischemia and is useful in the treatment of Alzheimer’s disease by
inhibiting cytotoxicity of β-amyloid oligomeric (Ganeshpurkar and Saluja 2017; Pan
et al. 2019). Rutin has significant antibacterial, antimalarial, as
well as anti-diabetic potential, whereas application of rutin on human leukemia
cell line (HL-60) showed a significant reduction in tumor size indicates its
potential antileukemic effect (Ganeshpurkar and
Saluja 2017). Interestingly, rutin was reported to have ameliorative
effects against toxicity induced by cancer chemotherapy including cisplatin (Almutairi et
al. 2017; Jahan et al. 2018),
idarubicin (Deihimi et al. 2017), and methotrexate (Gautam et al. 2016).
Moreover, rutin showed hepatoprotective (Khan et al. 2012a; Hafez et al. 2015) and nephroprotective (Khan et al. 2012b; Azevedo et al. 2013; Duarte et al. 2014) activities. The major plant source of rutin
production is Fagopyrum tataricum Gaertn. Using R. chalepensis
plants for different medicinal purposes needs more understanding of the genes
responsible for the biosynthesis of different chemical compounds especially
flavonoids e.g. rutin. Until now, rutin biosynthesis was studied in some plants
e.g. Asparagus officinalis and F. tataricum (Huang et al.
2016; Yi et al. 2019).
However, there are no studies available regarding the biosynthesis pathway of
rutin in Ruta genus, especially R. chalepensis.
Next-generation sequencing (NGS) has greatly
increased the opportunity to study the plant transcriptome to assess the
genetic changes behind the biosynthesis of active compounds inside the plant.
NGS has improved the quality of nucleic acids sequencing and decreased its cost
very significantly. Transcriptomic studies using NGS produce a huge amount of
transcript sequences data which can be used for different research purposes.
RNA-sequencing (RNA-Seq) is one of the most important applications of NGS. This
technique is typically used to quantify the changes in the expression levels of
different transcripts under different conditions, at different developmental
stages, or in different organs/tissues of the plant (Wang et al. 2009).
RNA-Seq technique was utilized to study the whole transcriptome of different
plants and to identify the expression levels of genes involved in biosynthesis
of their active components in different plat organs.
One of the major obstacles hindering the
application of RNA-Seq workflow on different organisms is the absence of
reference genome, especially in organisms having complex genomes with high
levels of repetitive DNA such as plants (Kerr et al. 2019). However,
development of de novo transcriptome assembly algorithms and tools
enabled researchers to perform different kind of analyses using RNA-Seq data in
non-model organisms with no reference genome available (Moreton et al. 2016; Ungaro et al. 2017). Moreover, de
novo assembly of transcriptome of organisms with or without reference
genome provides deep insights on the transcriptional changes and, thus,
biological variations in different organs or developmental stages of the
non-model species. De novo transcriptomic studies aimed to examine the
genes involved in biosynthesis of active compounds were reported on different
plants species e.g. Agave hybrid H11648 (Huang et al. 2019), Silene uralensis
and S. schafta (Bertrand et al. 2018), Lithospermum
officinale (Rai et al. 2018), and S. dioica (Cegan et al.
2017). This study was performed to generate de novo assembly of R.
chalepensis whole transcriptome based on data obtained from different
organs and functionally annotate the transcriptome to identify genes involved
in rutin biosynthesis. Our results would lay the foundation for further
genomics and transcriptomic studies on R. chalepensis and help in
dentification and genetic modification of genes involved in the biosynthesis of
secondary metabolites in medicinal plants.
R. chalepensis seedlings were obtained from a nursery in
Riyadh, Saudi Arabia. Seedlings were grown in a growth chamber affiliated with
our lab in the Department of Botany and Microbiology, College of Sciences, King
Saud University. They were grown for 3 months at 25 ± 2°C with 50 µmol m-2
s-1 light provided by cool white fluorescent lamps for 16/8 h
day/night. Seedlings were maintained in 30-cm diameter pots containing German
potting soil mix and irrigated day after day with 200 mL of distilled water.
Plants were fertilized monthly with NPK (20-5-5+TE) fertilizer.
After 3 months of growth, R. chalepensis
plants were carefully removed from the soil to avoid damaging roots as
possible. Leaves, stems, and roots were separated from each other and rinsed
with ultrapure Milli-Q water (Elix® Advantage, Merck KGaA,
Darmstadt, Germany) three times at least. Roots were washed in a sink with tap
water to remove all the remaining soil particles before rinsing with Milli-Q
water. After complete air drying, 1 g of fresh tissues were collected from
leaves, stems, and roots and instantly frozen in liquid nitrogen (-196°C) for
further RNA isolation.
RNA was isolated from three biological replicates of leaves, stems, and
roots tissues collected from three different R. chalepensis
plants using RNeasy® Plant Mini Kit (Cat No. 74904, QIAGEN, GmbH, D-40724
Hilden, Germany). Preliminary quality check (QC) of isolated RNA was performed
using Agarose Gel Electrophoresis. RNA quantity and purity were checked using
Thermo Scientific NanoDrop 2000c UV-Vis Spectrophotometer (Thermo Fisher
Scientific Inc., MA, USA). RNA integrity of the samples was checked using
Agilent 2100 Bioanalyzer (Agilent Technologies, CA, USA).
After assuring the quality of RNA samples, RNA libraries were
constructed using NEBNext® Ultra™ RNA Library Prep Kit for Illumina® (New
England Biolabs, MA, USA). The quality of the constructed libraries was then
checked using Qubit® 2.0 Fluorometer (Thermo Fisher Scientific Inc., MA, USA),
Agilent 2100 Bioanalyzer, and quantitative-PCR (Q-PCR). The qualified libraries
were fed into NovaSeq 6000 sequencers (Illumina, CA, USA) after pooling
according to its effective concentration and expected data volume. Sequencing was
performed to produce 6 Gb of data per sample using Illumina PE sequencing
methods with a length of 150 bp with an insertion size of 250 ~ 300 bp.
After obtaining FASTQ sequence files, their quality was analyzed using
FASTQC v0.11.9 software
(https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Raw data were
filtered to remove reads containing adapters, reads containing more than 10% of
N bases, and low-quality reads. Cutadapt tool v2.8 (Martin 2011) was used to filter the reads. RNA-seq error Corrector
(Rcorrector) tool (Song and Florea 2015)
was used to perform k-mer correction of filtered PE FASTQ files using default
k-mer length. Filtered and corrected FASTQ files (clean reads) were then used
for de novo transcriptome assembly and functional annotation.
Since there is no available reference genome for R. chalepensis
plant, de novo transcriptome assembly was performed to obtain a
reference that could be used for further functional annotation of potential
genes found in this plant. For preparation of reads for assembly using Linux
command line tools, all the left clean reads from different tissues and
replicates were combined in one file and, similarly, all right reads were
combined in one file for assembly. Trinity assembler v2.8.4 (Grabherr et al.
2011) was used for de novo assembly of RNA-Seq data obtained in
the current study in PE mode with in silico normalization and minimum
assembled contig list of 200. Quality of the assembled transcriptome was
checked via counting number of assembled transcripts that seemed to be
full-length or nearly full-length, examining read representation in the
assembly, OrthoDB’s sets of Benchmarking Universal Single-Copy Orthologs
“BUSCO” tool suite (http://busco.ezlab.org/) (Seppey et al. 2019), and investigating
the strand specificity of the generated data.
Annocript pipeline v2.0.1 (Musacchia et al. 2015) was used to
annotate the de novo assembled R. chalepensis whole
transcriptome. All the assembled transcripts were searched against several databases
using different BLAST tools. BLASTX was used to search the assembled
transcripts against SwissProt and UniRef90 databases. RPSTBLASTN was used to
reveal the Conserved Domains within each transcript via searching transcripts
against Conserved Domains Database (CDD; https://www.ncbi.nlm.nih.gov/cdd/).
For translation of nucleotide sequences into their respective amino acids,
Virtual Ribosome (dna2pep) v1.1 (Wernersson 2006)
was used.
Gene ontology (GO), enzyme, and pathway annotation: GO functional
annotation were performed based the results of both proteins annotated against
UniRef90 database and domains annotated against CDD database. Similarly, Enzyme
Commission (EC) IDs were assigned for each annotated sequence based on
SwissProt results. Moreover, annotated sequencing against SwissProt database
were assigned their respective pathway information based on pathway database
v2019_11 provided by UniProt – SwissProt Protein Knowledgebase (Morgat et al.
2011).
lncRNAs identification: BLASTN was used to
search the assembled sequences against RNAcentral database (The
Rnacentral Consortium 2018) that contained information from more than 40
databases to identify potential lncRNAs, ribosomal RNAs (rRNAs). Next, coding potential score for each
putative lncRNA was calculated using Coding Potential Calculator v2.0 (CPC2) software (Kang et al.
2017). Finally, RNA molecule was identified as lncRNA if it had coding
potential score less than 0.5, was longer than 200 bp, and had open reading
frame (ORF) shorter than 100 bp.
Microsatellite identification: The microsatellite
identification tool MISA (Thiel et al. 2003; Beier et al. 2017) was used to predict potential SSRs in the de
novo assembled transcriptome of R. chalepensis. MISA was used to
predict mono-nucleotide, di-nucleotide, tri-nucleotide, tetra-nucleotide,
penta-nucleotide, and hexa-nucleotide SSRs. Search criteria for identification
of potential SSRs were set to 10 repeats for mono-nucleotides, 6 repeats for
di-nucleotides, and 5 repeats for each tri-nucleotide, tetra-nucleotide,
penta-nucleotide, and hexa-nucleotides.
Transcription factors (TFs) identification: To identify potential
TFs in the de novo assembled transcriptome of R. chalepensis,
all the assembled genes were searched against Plant Transcription Factor
Database (PlantTFDB) v5.0 (Tian et al. 2019). BLASTX tool was
used to search assembled nucleotide sequences against TFs protein sequences
with an e-value of 1e-3.
The obtained results showed that all the isolated samples passed the
criteria for sequencing as all the samples were pure without any DNA
contamination as revealed by electrophoresis. Furthermore, NanoDrop report
showed high concentrations ranging from 153 to 718 ng µL-1.
A260/A280 for all samples was above 1.8 except the first replicate of leaves
sample (L1), which showed a ratio of 1.75. Similarly, A260/230 for all samples
ranged between 1.5 to 1.8 indicating no contamination with chaotropic salts or
organic compounds. All the samples showed RIN above 6 indicating high integrity
(Table 1).
Table 1: RNA QC results obtained using NanoDrop and Bioanalyzer
Sample a |
Sample
ID |
Conc (ng
µL-1) |
Amount
(µg) |
260/280 |
260/230 |
RIN |
L1 |
TZTR181213022 |
153 |
3.470 |
1.75 |
1.75 |
6.6 |
L2 |
TZTR181213023 |
436 |
6.540 |
2.04 |
1.76 |
6.3 |
L3 |
TZTR181213024 |
718 |
10.770 |
2.10 |
1.61 |
6.5 |
S1 |
TZTR181213025 |
514 |
7.710 |
2.12 |
1.81 |
8.5 |
S2 |
TZTR181213026 |
628 |
9.420 |
2.08 |
1.78 |
9.6 |
S3 |
TZTR181213029 |
452 |
6.780 |
2.11 |
1.81 |
9.0 |
R1 |
TZTR181213028 |
187 |
2.905 |
2.00 |
1.57 |
6.2 |
R2 |
TZTR181213027 |
321 |
4.815 |
2.02 |
1.80 |
9.0 |
R3 |
TZTR181213030 |
180 |
2.500 |
1.85 |
1.51 |
7.3 |
a L1, L2, L3: leaves; S1, S2, S3: stems; R1, R2, R3: roots
Table 2: Detailed statistics for the quality of sequencing data
Sample a |
Raw reads |
Cleaned reads |
Raw bases (Gb) |
Clean bases (Gb) |
Effective rate (%) b |
GC (%) |
L1 |
42,008,314 |
38,359,520 |
6.3 |
5.8 |
91.31 |
46 |
L2 |
47,128,552 |
42,897,490 |
7.1 |
6.4 |
91.02 |
46 |
L3 |
44,570,714 |
40,629,030 |
6.7 |
6.1 |
91.16 |
46 |
S1 |
49,951,230 |
45,807,238 |
7.5 |
6.9 |
91.70 |
45 |
S2 |
59,231,140 |
53,585,430 |
8.9 |
8.0 |
90.47 |
45 |
S3 |
54,594,866 |
49,705,054 |
8.2 |
7.5 |
91.04 |
45 |
R1 |
43,498,252 |
39,333,020 |
6.5 |
5.9 |
90.42 |
44 |
R2 |
42,389,424 |
37,992,108 |
6.4 |
5.7 |
89.63 |
45 |
R3 |
42,943,700 |
38,664,740 |
6.4 |
5.8 |
90.04 |
45 |
Total |
426,316,192 |
386,973,630 |
63.9 |
58.0 |
- |
- |
a L1, L2, L3: leaves; S1, S2, S3: stems; R1, R2, R3: roots
b Effective rate = clean reads / raw reads × 100
Fig. 1: Length distribution of R. chalepensis (a) de novo
assembled transcripts and (b) longest ORF of each transcript
The total amount of raw data generated from three different replicates
for each of the studied tissues equaled 63.9 Gb; however, after cleaning
(filtration and correction) total kept data was 58.0 Gb. The total generated
raw reads were 213 million PE reads (426 million reads). The total kept reads
after cleaning reached roughly 193.5 million PE reads (387 million reads). The
average effective rate calculated as number of cleaned reads over number of raw
reads was 90.76% (Table 2). No major
changes were observed in GC contents in different sequence replicates for the
studied tissues after cleaning process. In all cases, no change was observed in
GC content after filtration and correction except in the third replicate of
roots (R3) where it was increased from 44 to 45% in raw data and cleaned data,
respectively.
Table 3: Summary of de novo assembly of R. chalepensis whole
transcriptome using Trinity
Statistic |
Total |
Based on all genes |
Based on longest isoform |
Assembled
genes |
145,018 |
|
|
Assembled
transcripts |
254,685 |
|
|
GC (%) |
42.60 |
|
|
Median
contig length |
|
584 |
347 |
Average
contig length |
|
1,183.54 |
691.74 |
Total
assembled bases |
|
301,430,269 |
106,540,322 |
Contig
N10 |
|
4,938 |
4,461 |
Contig
N20 |
|
3,880 |
3,297 |
Contig
N30 |
|
3,215 |
2,542 |
Contig
N40 |
|
2,706 |
1,872 |
Contig
N50 |
|
2,280 |
1,221 |
Fig. 2: The
closest organisms (n = 10) to R.
chalepensis based on similar sequences identified in (a) SwissProt and (b)
UniRef90 databases
A total of 154,018 genes (unigenes) were assembled with GC content of
42.60%. These unigenes were represented by 254,685 transcripts (isoforms). The
average length of assembled contigs based on all transcript contigs was
1,183.54 bp, while the average length based on longest isoform per gene was
691.74 bp. Similarly, the median length of assembled contigs was 584 and 347 bp
based on all contigs and longest isoform per gene, respectively. The contig N50
value based on all assembled genes was 2280 bp (Table 3). The results of assembly quality measures showed high quality of
the resulting transcriptome (supplementary material S1).
Identification of coding transcripts: Out of 254,685 de
novo assembled transcripts, 202,961 (79.7%) had at least one BLAST result
against either UniRef90 or SwissProt databases. The lengths of all searched
sequences ranged from 184 to 15,955 bp with the majority ranging from 200 to
5,000 bp (Fig. 1a). Lengths of longest
ORFs ranged from 17 to 5,126 bp with mean length of roughly 204 bp. Majority of
longest ORFs (93.4%) had lengths between 50 and 1,000 bp (Fig. 1b). A total of 202,961 de novo
assembled transcripts had at least one blast result after searching against
either SwissProt or UniRef90 databases. Out of them, 139,400 transcripts had
results (hits) in agreement with longest ORF. Aligning the assembled
transcripts against SwissProt database led to annotation of 138,244 assembled
transcripts, of them 67,634 and 70,610 were aligned on positive and negative
strands, respectively. On the other hand, 198,704 assembled transcripts were
aligned to different hits in UniRef90 database with 96,979 alignments on the
positive strand and 101,725 alignments on the negative strand.
Identification of the closest organisms: Based on alignments
obtained in SwissProt database, Arabidopsis thaliana, Oryza
sativa subsp. japonica and Nicotiana tabacum were the closest
organism to R. chalepensis with 69,175, 4,487 and 1,329 similar
proteins, respectively (Fig. 2a).
Similarly, different species belong to Citrus genus including orange (C.
sinensis), clementine (C. clementina), and satsuma
mandarin (C. unshiu) were the closest organisms to R.
chalepensis with a total of 78,431 similar sequences (Fig. 2b) based on UniRef90 database annotations.
Protein conserved domains annotation: The obtained results
showed the presence of 143,262 conserved domains. Occurrence percentage of the
top 15 most occurred conserved domain is shown in Fig. 3. It was found that the most prominent conserved domains were
protein kinase domain (pfam00069; 2.83%), protein tyrosine kinase (pfam07714;
1.72%), PPR repeat family (pfam13041; 0.85%), RNA recognition motif (pfam00076;
0.74%), and cytochrome P450 (pfam00067; 0.65%).
GO annotation: Annotated sequences in
UniRef90 database revealed that the most enriched BP GO terms in R.
chalepensis de novo assembled transcriptome were regulation of
transcription (GO:0006355), translation (GO:0006412), and carbohydrate
metabolic process (GO:0005975) (Fig. 4).
Interestingly, flavonoid biosynthetic process (GO:0009813) and flavonoid
metabolic process (GO:0009812) were enriched with 55 and 6 different genes,
respectively. Moreover, L-phenylalanine biosynthetic process (GO:0009094),
cinnamic acid biosynthetic process (GO:0009800), and flavonol biosynthetic
process (GO:0051555) was enriched with 47, 12, and 3 different genes,
respectively.
In terms of cellular component (CC), the most
enriched terms included integral component of membrane (GO:0016021), nucleus
(GO:0005634), and cytoplasm (GO:0005737). Furthermore, the top enriched MF GO
terms based on the annotated proteins of R. chalepensis transcriptome
were mainly related to binding functions and protein kinase activities (Fig. 4). In this category, phenylalanine
ammonia-lyase (PAL) activity (GO:0045548) was enriched by 7 different genes.
Four and 12 different genes were found in trans-cinnamate
4-monooxygenase activity (GO:0016710) and 4-coumarate-CoA ligase activity (GO:0016207),
respectively. Naringenin-chalcone synthase activity (GO:0016210) and chalcone
isomerase activity (GO:0045430) were enriched by 27 and 3 different genes,
respectively. Flavonol synthase activity (GO:0045431), flavonol
3-O-glucosyltransferase activity (GO:0047893), and flavonol-3-O-glucoside
glucosyltransferase activity (GO:0033838) were enriched by 8, 6 and 4 genes,
respectively. Sixteen (16) genes were annotated in the term of flavonoid
3’-monooxygenase activity (GO:0016711). Quercetin 3’-O-glucosyltransferase
activity (GO:0080043) were enriched by 46 different genes, respectively.
The most enriched BP GO terms via CD identified
in R. chalepensis assembled transcripts were oxidation-reduction process
(GO:0055114), protein phosphorylation (GO:0006468), regulation of transcription
(GO:0006355), and transmembrane transport (GO:0055085) with percentages of
occurrence exceeding 7% (Fig. 5).
Interestingly, L-phenylalanine biosynthetic process (GO:0009094) and
L-phenylalanine catabolic process (GO:0006559) were enriched by 33 and 20
different genes, respectively. Enriched CC GO terms included membrane
(GO:0016020) with 4,743 genes, integral component of membrane (GO:0016021) with
3,713 genes, and ribosome (GO:0005840) with 2,139 genes (Fig. 5). ATP binding (GO:0005524), protein kinase activity (GO:0004672), and protein binding
(GO:0005515) were the most enriched MF terms in the conserved domains of R.
chalepensis.
Fig. 3: The
most occurred conserved domains (n = 15) identified in the de novo assembled transcriptome of R. chalepensis
Fig. 4: The
top 10 enriched biological process (BP), cellular component (CC), and molecular
function (MF) GO terms by the de novo
assembled transcripts of R. chalepensis
Pathway enrichment: Pathway enrichment was
analyzed based on the annotation results against SwissProt database on two
different levels of pathways (Fig. 6).
The most enriched level 1 pathways were mainly related to fundamental processes
including protein modification, amino-acid biosynthesis, and lipid metabolism.
On that level, flavonoid metabolism pathway was enriched by 31 different de
novo assembled genes. Similarly, top enriched level 2 pathways included
protein ubiquitination, protein glycosylation, and glycolysis. Interestingly,
analysis of level 2 pathway enrichment revealed that flavonoid biosynthesis pathway
was enriched by 116 genes (Fig. 6).
Fig. 5: The top 10 enriched biological
process (BP), cellular component (CC), and molecular function (MF) GO terms by
the identified conserved domains in R.
chalepensis
Fig. 6: The
most enriched level 1 and 2 pathways by the de
novo assembled transcripts of R. chalepensis
Microsatellite identification: A total of 45,367 of
SSRs were identified in the de novo assembled transcriptome of R.
chalepensis. These SSRs were identified in 27,443 sequences with 10,162
sequences have more than 1 SSR. If the same SSR were repeated within 100 bp of
the sequences, it was considered as compound SSR. This rule identified 7,291
SSRs that present in compound formation. The mono-nucleotide SSRs were the most
identified with 32769 (72.23%) occurrences, while the penta-nucleotide SSRs
were least abundant with only 165 (0.36%) occurrences (Table 4).
Table 4: Number of identified SSRs in the de novo assembled transcriptome
of R. chalepensis
SSR |
Occurrences |
Percentage |
Mono-nucleotide |
32,769 |
72.23% |
Di-nucleotide |
5,495 |
12.11% |
Tri-nucleotide |
6,113 |
13.47% |
Tetra-nucleotide |
537 |
1.18% |
Penta-nucleotide |
165 |
0.36% |
Hexa-nucleotide |
288 |
0.63% |
Total |
45,367 |
100.00% |
Table 5: Number of transcripts/isoforms in the de novo assembled
transcriptome of R. chalepensis that represent each enzyme involved in
the biosynthesis of rutin
EC ID |
Enzyme
description |
Abbreviation |
Occurrences |
4.3.1.24 |
Phenylalanine
ammonia-lyase |
PAL |
13 |
1.14.14.91 |
trans-Cinnamate 4-monooxygenase |
C4H |
6 |
6.2.1.12 |
4-coumarate
CoA ligase |
4CL |
22 |
2.3.1.74 |
Chalcone
synthase |
CHS |
29 |
5.5.1.6 |
Chalcone
isomerase |
CHI |
6 |
1.14.11.9 |
Flavonol
synthase |
FLS |
45 |
1.14.20.6 |
Flavanone-3-
dioxygenase |
F3H |
12 |
1.14.14.82 |
Flavonoid-3’-
monooxygenase |
F3'H |
13 |
2.4.1.91 |
Flavonol
3-O-glucosyltransferase |
- |
8 |
Fig. 7: The
most represented TFs families in the de
novo assembled transcriptome of R. chalepensis
lncRNAs identification: lncRNAs were identified
as the RNA molecule that was longer than 200 bp with coding potential score
less than 0.5 and longest ORF shorter than 100 bp. Following these rules,
46,213 de novo assembled transcripts were identified as potential
lncRNAs. Identified lncRNAs were generally shorter than protein coding
transcripts with average transcript length of 813 and 2497 bp, respectively.
TFs identification: A total of 25,312
potential TFs were identified in the de novo assembled transcriptome of R.
chalepensis. The most represented TF families were bHLH family with
2,719 genes followed by MYB_related with 1,722 genes, NAC family with 1,642
genes, ERF family with 1,447 genes, C2H2 family with 1292 genes, and C3H family
with 1,011 genes. Furthermore, other 6 TFS families were represented by more
than 750 different genes (Fig. 7).
Identification of enzymes involved in rutin
biosynthesis: A total of 49,270 genes/transcripts were annotated to be associated
with at least one enzyme and their respected EC ids were assigned. Enzymes
known to be involved in the biosynthesis of rutin were examined. The number of
transcripts/isoforms represent each enzyme involved in the biosynthesis of
rutin was identified (Table 5).
Surprisingly, flavonol-3-O-glucoside L-rhamnosyltransferase enzyme was not
represented in any of the de novo assembled transcripts of R.
chalepensis. However, 8 different transcripts were annotated as an isoform
for a putative UDP-rhamnose rhamnosyltransferase identified in Fragaria ananassa
and Pistacia vera.
Application of RNA-Seq techniques to identify potential genes involved
in biosynthesis of natural products of different plants has been recently
increased and well-studied. Using NGS techniques, tons of transcriptomic data can
be generated to examine the changes happen in transcription landscape of the
plant during different stages or the variation in transcription status between
different plant’s organs. However, one of the critical limitations facing
RNA-Seq applications is the quality of the generated sequences (Del Fabbro et
al. 2013). Therefore, several quality check and control tools has
been developed including, FastQC, Cutadapt, and Trimmomatic, etc. In the
current study, application of different quality control measures e.g. trimming
of reads, removal of reads containing adapters or high N percentage, and
removal of low-quality reads significantly enhanced the quality of the
generated RNA-Seq data. However, in all these samples, fluctuations in per base
sequence content were observed in the first 6 – 7 bases which is common in RNA
sequencing because of primer amplification bias and seems not to affect any
further analysis of data (Williams et al. 2016).
The de novo assembled transcriptome of R.
chalepensis in the current study had a total length of 301 million base
pairs (Mb). Based on flowcytometric analysis, R. chalepensis plants were
reported to have 40 chromosomes (2n) and a total genome length of 686 Mb (Guerra 1984). In sweet orange (C. sinensis),
which is the closest relative to R. chalepensis with whole genome
sequence available, less than 50% of the whole genome is transcribed into
different kinds of RNA (Xu et al. 2013). Furthermore, our analyses showed that average
transcript length is 1,183 bp. This agrees with the average coding transcript
length in sweet orange with 1,255 bp (Xu et al. 2013). Further quality
check measures were applied to examine the quality of de novo assembled
transcriptome of R. chalepensis. It was found that a total of 13,316
proteins in the SwissProt database had covered the de novo assembled
transcripts of R. chalepensis by more than 80% of their respective
lengths. This could be an indicator for the quality of the assembled
transcriptome as most of the transcripts represent nearly full-length proteins.
Afterwards, all the generated RNA-Seq reads were aligned back to de novo
assembled transcriptome to examine their representation in the transcriptome.
The alignment rate for the 9 replicates obtained from the different studied
tissues (leaves, stems, and roots) ranged from 96.51 to 97.74%, indicating high
representation of reads and high quality of the assembled transcriptome. BUSCO
analysis is ideal for quantifications of completeness of transcriptome assembly
(Waterhouse
et al. 2012; Simão et al. 2015).
Analyzing
the de novo assembled transcriptome of R. chalepensis with BUSCO
revealed that roughly 96% of BUSCOs in the Embryophyta (odb10) dataset
were found in in their complete form while only 0.01% was missed. Moreover,
analysis of transcriptome quality using strand specificity analysis showed that
de novo assembled transcriptome of R. chalepensis is of
high-quality and could be used for further analysis.
Functional annotation of the de novo
assembled transcriptome of R. chalepensis led to annotation of 79.7% of
the assembled transcripts. Based on these annotations, the closest organism to R.
chalepensis i.e., the organism that shared the highest number of protein
isoforms was A. thaliana based on SwissProt results. This result
is reasonable as A. thaliana is the most represented plant in the
SwissProt database (Bairoch and Apweiler 2000).
On the other hand, based on the UniRef90 results, different species belonging
to Citrus genus were identified as the closest organisms to R.
chalepensis. According to the APG IV classification system (The Catalogue of Life Partnership 2017), R.
chalepensis belongs to the Rutaceae family that includes Citrus
genus. Furthermore, P. vera and Acer yangbiense of
the family Sapindaceae were amongst the closest organisms to R. chalepensis.
Rutaceae and Sapindaceae belong to the same order (Sapindales) according to the
APG IV classification system (The Catalogue of
Life Partnership 2017).
In the present study, 143,262 protein conserved
domains were identified in the de novo assembled transcriptome of R.
chalepensis. Protein conserved domains have different roles, especially in the vital
processes. For example, protein kinase domain including tyrosine kinase domains
have some catalytic functions (Gomperts et al. 2009). In plants,
cytochrome P450 plays a critical role in the biosynthesis of secondary
metabolites,
especially flavonoids (Ayabe and Akashi 2006).
Furthermore, it could play prominent roles in production of hormones and
defensive compounds (Saxena et al. 2013). A significant number of secondary metabolites
that cytochrome P450 involved in their biosynthesis characterized as inhibitors
for different pathogens, herbivores, and insects (Kaspera and Croteau 2006).
The GO annotation of R. chalepensis de novo
assembled transcriptome in the present study revealed that flavonoid
biosynthetic process, flavonoid metabolic process, and flavonol biosynthetic
process were enriched. Furthermore, L-phenylalanine and cinnamic acid
biosynthetic processes were enriched in de novo assembled transcriptome.
Phenylalanine is a precursor for different secondary metabolites including
flavonoids (Comino et al. 2007). Furthermore, naringenin chalcone (the main
pre-cursor of rutin) is mainly formed from 4-coumaroyl-CoA resulting mainly
from cinnamic acid (Panwar et al. 2012). Our results showed that the activities of
PAL, trans-cinnamate 4-monooxygenase, 4-coumarate-CoA ligase,
naringenin-chalcone synthase, chalcone isomerase, flavonol synthase, and
flavonol 3-O-glucosyltransferase were enriched in the de novo assembled
transcriptome of R. chalepensis indicating the enrichment of rutin
biosynthesis. Rutin in plants is biosynthesized mainly via flavonoid and
flavonol biosynthesis pathways (Liu et al. 2019; Yi et al. 2019). Further analysis of pathway enrichment in the
current study showed that flavonoid metabolism and flavonoid biosynthesis
pathway were enriched by 31 and 116 different de novo assembled genes,
respectively.
SSRs are very important tools in biotechnological
studies as they provide several polymorphic DNA markers (Tautz 1989) that could be utilized in studies of genomic mapping (Hearne et al.
1992) and phylogenetic analysis (Queller et al. 1993). In the current
study, we identified 45,367 SSRs in R. chalepensis with different lengths
ranging from 1 to 6 nucleotides.
Analysis of enzymes in relation to the
biosynthesis of rutin revealed that all enzymes known to be involved in this
process were abundant in the de novo assembled transcriptome of R. chalepensis. Our results showed that
162 different transcripts in relation to rutin biosynthesis exist in R.
chalepensis. In Asparagus officinalis, 57 genes were
identified to be involved in rutin biosynthesis including C4H, CHS,
and F3'H (Yi et al. 2019). Similarly, in tartary buckwheat, 47 genes
were identified in relation to biosynthesis of rutin including PAL, CHS,
CHI, FLS, and F3'H (Liu et al. 2018).
Whole transcriptome of R. chalepensis plant was
sequenced, and de novo assembled using available bioinformatics methods.
The results showed that R. chalepensis characterized by diverse and rich
transcriptome landscape containing several protein coding and long non-coding
RNAs. Several SSRs and TFs were identified in R. chalepensis.
Moreover, analysis of assembled transcriptome reveals different genes involved
in the biosynthesis of rutin in this plant. Further studies to examine the
parts of R. chalepensis in which rutin is mainly biosynthesized and the
relation with transcriptomic changes are recommended.
The authors are thankful to the Research Supporting Project number
(RSP-2020/86), King Saud University, Riyadh, Saudi Arabia.
EMA and MF planned the experiments, EMA and AAQ performed the lab work,
EMA performed the computational analysis and illustrations, EMA, AAA and MF
interpreted the results, EMA and MF made the write up, AAA and AAQ reviewed the
manuscript.
Al-Majmaie S, L Nahar, GP Sharples,
K Wadi, SD Sarker (2019). Isolation and antimicrobial activity of rutin and its
derivatives from Ruta chalepensis (Rutaceae)
growing in Iraq. Rec Nat Prod 13:64–70
Almutairi MM, WA Alanazi, MA
Alshammari, MR Alotaibi, AR Alhoshani, SS Al-Rejaie, MM Hafez, OA Al-Shabanah
(2017). Neuro-protective effect of rutin against Cisplatin-induced neurotoxic
rat model. BMC Compl Altern Med
17:Article 472
Ayabe SI, T Akashi (2006).
Cytochrome P450s in flavonoid metabolism. Phytochem
Rev 5:271–282
Azevedo MI, AF Pereira, RB
Nogueira, FE Rolim, GAC Brito, DVT Wong, RCP Lima-Júnior, R de Albuquerque
Ribeiro, ML Vale (2013). The antioxidant effects of the flavonoids rutin and
quercetin inhibit oxaliplatin-induced chronic painful peripheral neuropathy. Mol Pain 9; Article 53
Bairoch A, R Apweiler (2000). The
SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucl Acids Res 28:45–48
Beier S, T Thiel, T Münch, U
Scholz, M Mascher (2017). MISA-web: A web server for microsatellite prediction.
Bioinformatics 33:2583–2585
Bertrand YJK, A Petri, AC Scheen, M
Topel, B Oxelman (2018). De novo transcriptome
assembly, annotation, and identification of low-copy number genes in the
flowering plant genus Silene (Caryophyllaceae).
BioRxiv; Article 290510
Catalogue of Life Partnership (2017).
https://www.gbif.org/dataset/7ddf754f-d193-4cc9-b351-99906754a03b (Accessed on:
August 21 2020)
Cegan R, V Hudzieczek, R Hobza
(2017). De novo transcriptome
assembly of heavy metal tolerant Silene
dioica. Genom Data 11:118–119
Comino C, S Lanteri, E Portis, A
Acquadro, A Romani, A Hehn, R Larbat, F Bourgaud (2007). Isolation and
functional characterization of a cDNA coding a hydroxycinnamoyltransferase
involved in phenylpropanoid biosynthesis in Cynara
cardunculus L. BMC Plant Biol 7;
Article 14
Deihimi M, S Moghbelinejad, R
Najafipour, K Parivar, M Nassiri-Asl (2017). The effects of rutin on the gene
expression of Dazl, Bcl2, and Caspase3 in idarubicin-induced testicular damages in mice. Iran Red Crescent Med J 19; Article
e44765
Del Fabbro C, S Scalabrin, M
Morgante, FM Giorgi (2013). An extensive evaluation of read trimming effects on
Illumina NGS data analysis. PLoS One
8; Article e85024
Duarte R, A Carvalho, D Gadelha, V
Braga (2014). Rutin reduces oxidative stress in animals with renovascular
hypertension. BMC Proc 8; Article P65
Emam A, M Eweis, M Elbadry (2010).
A new furoquinoline alkaloid with antifungal activity from the leaves of Ruta chalepensis L. Drug Discov Ther 4:399–404
Ganeshpurkar A, AK Saluja (2017).
The pharmacological potential of rutin. Saudi
Pharm J 25:149–164
Gautam R, M Singh, S Gautam, JK
Rawat, SA Saraf, G Kaithwas (2016). Rutin attenuates intestinal toxicity
induced by Methotrexate linked with anti-oxidative and anti-inflammatory
effects. BMC Compl Altern Med 16;
Article 99
Gomperts BD, IM Kramer, PER Tatham
(2009). Chapter 24 - Protein domains and signal transduction. In: Signal Transduction, 2nd
edn, pp:763–790. Gomperts BD, IM
Kramer, PER Tatham (Eds.). Academic Press, San Diego, California, USA
Grabherr MG, BJ Haas, M Yassour, JZ
Levin, DA Thompson, I Amit, X Adiconis, L Fan, R Raychowdhury, Q Zeng, Z Chen,
E Mauceli, N Hacohen, A Gnirke, N Rhind, F di Palma, BW Birren, C Nusbaum, K
Lindblad-Toh, N Friedman, A Regev (2011). Full-length transcriptome assembly
from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–652
Guerra MDS (1984). Cytogenetics of
Rutaceae. II. Nuclear DNA content. Caryologia
37:219–226
Günaydin K, S Savci (2005).
Phytochemical studies on Ruta chalepensis
(LAM.) Lamarck. Nat Prod Res
19:203–210
Hafez MM, NO Al-Harbi, AR
Al-Hoshani, KA Al-hosaini, SD Al Shrari, SS Al Rejaie, MM Sayed-Ahmed, OA
Al-Shabanah (2015). Hepato-protective effect of rutin via IL-6/STAT3 pathway in
CCl4-induced hepatotoxicity in rats. Biol Res 48; Article 30
Hearne CM, S Ghosh, JA Todd (1992).
Microsatellites for linkage analysis of genetic traits. Trends Genet 8:288–294
Huang X, M Xiao, J Xi, C He, J
Zheng, H Chen, J Gao, S Zhang, W Wu, Y Liang, L Xie, K Yi (2019). De Novo transcriptome assembly of Agave H11648 by Illumina sequencing and
identification of cellulose synthase genes in Agave species. Genes 10;
Article 103
Huang X, J Yao, Y Zhao, D Xie, X
Jiang, Z Xu (2016). Efficient rutin and quercetin biosynthesis through
flavonoids-related gene expression in Fagopyrum
tataricum Gaertn. hairy root cultures with UV-B irradiation. Front Plant Sci 7; Article 63
Jahan S, A Munawar, S Razak, S
Anam, QU Ain, H Ullah, T Afsar, M Abulmeaty, A Almajwal (2018). Ameliorative
effects of rutin against cisplatin-induced reproductive toxicity in male rats. BMC Urol 18; Article 107
Kang YJ, DC Yang, L Kong, M Hou, YQ
Meng, L Wei, G Gao (2017). CPC2: A fast and accurate coding potential
calculator based on sequence intrinsic features. Nucl Acids Res 45:W12-W16
Kaspera R, R Croteau (2006).
Cytochrome P450 oxygenases of taxol biosynthesis. Phytochem Rev 5:433–444
Kerr SC, F Gaiti, M Tanurdzic
(2019). De Novo plant transcriptome
assembly and annotation using Illumina RNA-Seq Reads. In: Plant Long Non-Coding RNAs: Methods and Protocols, pp:265–275. Chekanova JA, H-LV Wang (eds.). Springer, New
York, USA
Khan RA, MR Khan, S Sahreen
(2012a). CCl4-induced hepatotoxicity: Protective effect of rutin on
p53, CYP2E1 and the antioxidative status in rat. BMC Compl Altern Med 12; Article 178
Khan RA, MR Khan, S Sahreen
(2012b). Protective effects of rutin against potassium bromate induced
nephrotoxicity in rats. BMC Compl Altern
Med 12; Article 204
Liu M, Z Ma, T Zheng, W Sun, Y Zhang, W Jin, J Zhan, Y Cai, Y Tang, Q Wu,
Z Tang, T Bu, C Li, H Chen (2018). Insights into
the correlation between Physiological changes in and seed development of
tartary buckwheat (Fagopyrum tataricum Gaertn.).
BMC Genomics19; Article 648
Liu YY, XR Chen, JP Wang, WQ Cui,
XX Xing, XY Chen, WY Ding, BO God’spower, N Eliphaz, MQ Sun, YH Li (2019).
Transcriptomic analysis reveals flavonoid biosynthesis of Syringa oblata Lindl. in response to different light intensity. BMC Plant Biol 19; Article 487
Loizzo MR, T Falco, M Bonesi, V
Sicari, R Tundis, M Bruno (2018). Ruta
chalepensis L. (Rutaceae) leaf extract: Chemical composition, antioxidant
and hypoglicaemic activities. Nat Prod
Res 32:521–528
Martin M (2011). Cutadapt removes
adapter sequences from high-throughput sequencing reads. EMBnet J 17:10–12
Moreton J, A Izquierdo, RD Emes
(2016). Assembly, assessment, and availability of de novo generated eukaryotic transcriptomes. Front Genet 6; Article 361
Morgat A, E Coissac, E Coudert, KB
Axelsen, G Keller, A Bairoch, A Bridge, L Bougueleret, I Xenarios, A Viari
(2011). UniPathway: A resource for the exploration and annotation of metabolic
pathways. Nucl Acids Res 40:D761–D769
Musacchia F, S Basu, G Petrosino,
M Salvemini, R Sanges (2015). Annocript: A flexible pipeline for
the annotation of transcriptomes able to identify putative long noncoding RNAs.
Bioinformatics 31:2199–2201
Pan RY, J Ma, XX Kong, XF Wang, SS
Li, XL Qi, YH Yan, J Cheng, Q Liu, W Jin, CH Tan, Z Yuan (2019). Sodium rutin
ameliorates Alzheimer’s disease-like pathology by enhancing microglial
amyloid-β clearance. Sci Adv 5;
Article eaau6328
Panwar A, N Gupta, RS Chauhan
(2012). Biosynthesis and accumulation of flavonoids in Fagopyrum spp. Eur J Plant
Sci Biotechnol 6:17–26
Queller DC, JE Strassmann, CR
Hughes (1993). Microsatellites and kinship. Trends
Ecol Evol 8:285–288
Rai A, T Nakaya, Y Shimizu, M Rai, M
Nakamura, H Suzuki, K Saito, M Yamazaki (2018). De novo transcriptome assembly and characterization
of Lithospermum officinale to
discover putative genes involved in specialized metabolites biosynthesis. Planta Med 84:920–934
Saxena A, P Singh, DK Yadav, P
Sharma, S Alam, F Khan, ST Thul, RK Shukla, V Gupta, NS Sangwan (2013).
Identification of cytochrome P450 heme motif in plants proteome. Plant Omics J 6:1–12
Seppey M, M Manni, EM
Zdobnov (2019). BUSCO: Assessing genome assembly and annotation completeness. In: Gene Prediction: Methods and Protocols, pp:227–245. Kollmar M (Ed.). Springer, New York, USA
Simão FA, RM Waterhouse, P Ioannidis, EV Kriventseva, EM Zdobnov (2015). BUSCO: Assessing genome assembly and annotation completeness
with single-copy orthologs. Bioinformatics
31:3210–3212
Song L, L Florea (2015).
Rcorrector: Efficient and accurate error correction for Illumina RNA-seq reads.
GigaScience 4; Article 48
Tautz D (1989). Hypervariability of
simple sequences as a general source for polymorphic DNA markers. Nucl Acids Res 17:6463–6471
The Catalogue of Life Partnership
(2017). APG IV: Angiosperm Phylogeny Group classification for the orders and
families of flowering plants. Available at GBIF.org
The Rnacentral Consortium (2018).
RNAcentral: A hub of information for non-coding RNA sequences. Nucl Acids Res 47:D221–D229
Thiel T, W Michalek, R Varshney, A
Graner (2003). Exploiting EST databases for the development and
characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.). Theor
Appl Genet 106:411–422
Tian F, DC Yang, YQ Meng, J Jin, G
Gao (2019). PlantRegMap: Charting functional regulatory maps in plants. Nucl Acids Res 48:D1104–D1113
Ungaro A, N Pech, JF Martin, RJS
McCairns, JP Mévy, R Chappaz, A Gilles (2017). Challenges and advances for
transcriptome assembly in non-model species. PLoS One 12; Article e0185020
Wang Z, M Gerstein, M Snyder
(2009). RNA-Seq: A revolutionary tool for transcriptomics. Nat Rev Genet 10:57–63
Waterhouse RM, F Tegenfeldt, J Li,
EM Zdobnov, EV Kriventseva (2012). OrthoDB: A hierarchical catalog of animal,
fungal and bacterial orthologs. Nucl
Acids Res 41:D358–D365
Wernersson R (2006). Virtual
ribosome—a comprehensive DNA translation tool with support for integration of
sequence feature annotation. Nucl Acids
Res 34:W385–W388
Williams CR, A Baccarella, JZ
Parrish, CC Kim (2016). Trimming of sequence reads alters RNA-Seq gene
expression estimates. BMC Bioinform
17; Article 103
Xu Q, LL Chen, X Ruan, D Chen, A
Zhu, C Chen, D Bertrand, WB Jiao, BH Hao, MP Lyon, J Chen, S Gao, F Xing, H
Lan, JW Chang, X Ge, Y Lei, Q Hu, Y Miao, L Wang, S Xiao, MK Biswas, W Zeng, F
Guo, H Cao, X Yang, XW Xu, YJ Cheng, J Xu, JH Liu, OJ Luo, Z Tang, W-W Guo, H
Kuang, HY Zhang, ML Roose, N Nagarajan, XX Deng, Y Ruan (2013). The draft
genome of sweet orange (Citrus sinensis).
Nat Genet 45:59–66
Yi TG, YR Yeoung, IY Choi, NI Park
(2019). Transcriptome analysis of Asparagus
officinalis reveals genes involved in the biosynthesis of rutin and
protodioscin. PLoS One 14; Article
e0219973
Yoshida T, Y Watanabe, S Ishiura
(2019). Production of the herb Ruta
chalepensis L. expressing amyloid β-GFP fusion protein. Proc Jpn Acad Ser B Phys Biol Sci 95:295–302